13 research outputs found

    A deep neural network approach to predicting clinical outcomes of neuroblastoma patients

    Get PDF
    Background The availability of high-throughput omics datasets from large patient cohorts has allowed the development of methods that aim at predicting patient clinical outcomes, such as survival and disease recurrence. Such methods are also important to better understand the biological mechanisms underlying disease etiology and development, as well as treatment responses. Recently, different predictive models, relying on distinct algorithms (including Support Vector Machines and Random Forests) have been investigated. In this context, deep learning strategies are of special interest due to their demonstrated superior performance over a wide range of problems and datasets. One of the main challenges of such strategies is the “small n large p” problem. Indeed, omics datasets typically consist of small numbers of samples and large numbers of features relative to typical deep learning datasets. Neural networks usually tackle this problem through feature selection or by including additional constraints during the learning process. Methods We propose to tackle this problem with a novel strategy that relies on a graph-based method for feature extraction, coupled with a deep neural network for clinical outcome prediction. The omics data are first represented as graphs whose nodes represent patients, and edges represent correlations between the patients’ omics profiles. Topological features, such as centralities, are then extracted from these graphs for every node. Lastly, these features are used as input to train and test various classifiers. Results We apply this strategy to four neuroblastoma datasets and observe that models based on neural networks are more accurate than state of the art models (DNN: 85%-87%, SVM/RF: 75%-82%). We explore how different parameters and configurations are selected in order to overcome the effects of the small data problem as well as the curse of dimensionality. Conclusions Our results indicate that the deep neural networks capture complex features in the data that help predicting patient clinical outcomes

    Gene prioritization and clustering by multi-view text mining

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.</p> <p>Results</p> <p>We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods.</p> <p>Conclusions</p> <p>In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.</p

    Collaboratively charting the gene-to-phenotype network of human congenital heart defects

    Get PDF
    Background How to efficiently integrate the daily practice of molecular biologists, geneticists, and clinicians with the emerging computational strategies from systems biology is still much of an open question. Description We built on the recent advances in Wiki-based technologies to develop a collaborative knowledge base and gene prioritization portal aimed at mapping genes and genomic regions, and untangling their relations with corresponding human phenotypes, congenital heart defects (CHDs). This portal is not only an evolving community repository of current knowledge on the genetic basis of CHDs, but also a collaborative environment for the study of candidate genes potentially implicated in CHDs - in particular by integrating recent strategies for the statistical prioritization of candidate genes. It thus serves and connects the broad community that is facing CHDs, ranging from the pediatric cardiologist and clinical geneticist to the basic investigator of cardiogenesis. Conclusions This study describes the first specialized portal to collaboratively annotate and analyze gene-phenotype networks. Of broad interest to the biological community, we argue that such portals will play a significant role in systems biology studies of numerous complex biological processes. CHDWiki is accessible at http://www.esat.kuleuven.be/~bioiuser/chdwikistatus: publishe

    iPSC-Derived Microglia as a Model to Study Inflammation in Idiopathic Parkinson's Disease.

    Get PDF
    Parkinson's disease (PD) is a neurodegenerative disease with unknown cause in the majority of patients, who are therefore considered "idiopathic" (IPD). PD predominantly affects dopaminergic neurons in the substantia nigra pars compacta (SNpc), yet the pathology is not limited to this cell type. Advancing age is considered the main risk factor for the development of IPD and greatly influences the function of microglia, the immune cells of the brain. With increasing age, microglia become dysfunctional and release pro-inflammatory factors into the extracellular space, which promote neuronal cell death. Accordingly, neuroinflammation has also been described as a feature of PD. So far, studies exploring inflammatory pathways in IPD patient samples have primarily focused on blood-derived immune cells or brain sections, but rarely investigated patient microglia in vitro. Accordingly, we decided to explore the contribution of microglia to IPD in a comparative manner using, both, iPSC-derived cultures and postmortem tissue. Our meta-analysis of published RNAseq datasets indicated an upregulation of IL10 and IL1B in nigral tissue from IPD patients. We observed increased expression levels of these cytokines in microglia compared to neurons using our single-cell midbrain atlas. Moreover, IL10 and IL1B were upregulated in IPD compared to control microglia. Next, to validate these findings in vitro, we generated IPD patient microglia from iPSCs using an established differentiation protocol. IPD microglia were more readily primed as indicated by elevated IL1B and IL10 gene expression and higher mRNA and protein levels of NLRP3 after LPS treatment. In addition, IPD microglia had higher phagocytic capacity under basal conditions-a phenotype that was further exacerbated upon stimulation with LPS, suggesting an aberrant microglial function. Our results demonstrate the significance of microglia as the key player in the neuroinflammation process in IPD. While our study highlights the importance of microglia-mediated inflammatory signaling in IPD, further investigations will be needed to explore particular disease mechanisms in these cells

    Integrating Computational Biology and Forward Genetics in Drosophila

    Get PDF
    Genetic screens are powerful methods for the discovery of gene–phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of “omics” data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene–gene association discovery

    L2-norm multiple kernel learning and its application to biomedical data fusion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it><sub>∞</sub>, <it>L</it><sub>1</sub>, and <it>L</it><sub>2 </sub>MKL. In particular, <it>L</it><sub>2 </sub>MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it><sub>∞ </sub>MKL method. In real biomedical applications, <it>L</it><sub>2 </sub>MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.</p> <p>Results</p> <p>We provide a theoretical analysis of the relationship between the <it>L</it><sub>2 </sub>optimization of kernels in the dual problem with the <it>L</it><sub>2 </sub>coefficient regularization in the primal problem. Understanding the dual <it>L</it><sub>2 </sub>problem grants a unified view on MKL and enables us to extend the <it>L</it><sub>2 </sub>method to a wide range of machine learning problems. We implement <it>L</it><sub>2 </sub>MKL for ranking and classification problems and compare its performance with the sparse <it>L</it><sub>∞ </sub>and the averaging <it>L</it><sub>1 </sub>MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it><sub>2 </sub>MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it><sub>2 </sub>MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.</p> <p>Conclusions</p> <p>This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it><sub>∞ </sub>MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it><sub>2 </sub>kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.</p> <p>Availability</p> <p>The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

    Systems level analysis of sex-dependent gene expression changes in Parkinson’s disease

    No full text
    Parkinson’s disease (PD) is a heterogeneous disorder, and among the factors which influence the symptom profile, biological sex has been reported to play a significant role. While males have a higher age-adjusted disease incidence and are more frequently affected by muscle rigidity, females present more often with disabling tremors. The molecular mechanisms involved in these differences are still largely unknown, and an improved understanding of the relevant factors may open new avenues for pharmacological disease modification. To help address this challenge, we conducted a meta-analysis of disease-associated molecular sex differences in brain transcriptomics data from case/control studies. Both sex-specific (alteration in only one sex) and sex-dimorphic changes (changes in both sexes, but with opposite direction) were identified. Using further systems level pathway and network analyses, coordinated sex-related alterations were studied. These analyses revealed significant disease-associated sex differences in mitochondrial pathways and highlight specific regulatory factors whose activity changes can explain downstream network alterations, propagated through gene regulatory cascades. Single-cell expression data analyses confirmed the main pathway-level changes observed in bulk transcriptomics data. Overall, our analyses revealed significant sex disparities in PD-associated transcriptomic changes, resulting in coordinated modulations of molecular processes. Among the regulatory factors involved, NR4A2 has already been reported to harbour rare mutations in familial PD and its pharmacological activation confers neuroprotective effects in toxin-induced models of Parkinsonism. Our observations suggest that NR4A2 may warrant further research as a potential adjuvant therapeutic target to address a subset of pathological molecular features of PD that display sex-associated profiles
    corecore